media data
Traffic and Mobility Optimization Using AI: Comparative Study between Dubai and Riyadh
Urban planning plays a very important role in development modern cities. It effects the economic growth, quality of life, and environmental sustainability. Modern cities face challenges in managing traffic congestion. These challenges arise to due to rapid urbanization. In this study we will explore how AI can be used to understand the traffic and mobility related issues and its effects on the residents sentiment. The approach combines real-time traffic data with geo-located sentiment analysis, offering a comprehensive and dynamic approach to urban mobility planning. AI models and exploratory data analysis was used to predict traffic congestion patterns, analyze commuter behaviors, and identify congestion hotspots and dissatisfaction zones. The findings offer actionable recommendations for optimizing traffic flow, enhancing commuter experiences, and addressing city specific mobility challenges in the Middle East and beyond.
Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input
Masis, Tessa, O'Connor, Brendan
Geo-entity linking is the task of linking a location mention to the real-world geographic location. In this paper we explore the challenging task of geo-entity linking for noisy, multilingual social media data. There are few open-source multilingual geo-entity linking tools available and existing ones are often rule-based, which break easily in social media settings, or LLM-based, which are too expensive for large-scale datasets. We present a method which represents real-world locations as averaged embeddings from labeled user-input location names and allows for selective prediction via an interpretable confidence score. We show that our approach improves geo-entity linking on a global and multilingual social media dataset, and discuss progress and problems with evaluating at different geographic granularities.
Navigating the Post-API Dilemma Search Engine Results Pages Present a Biased View of Social Media Data
Recent decisions to discontinue access to social media APIs are having detrimental effects on Internet research and the field of computational social science as a whole. This lack of access to data has been dubbed the Post-API era of Internet research. Fortunately, popular search engines have the means to crawl, capture, and surface social media data on their Search Engine Results Pages (SERP) if provided the proper search query, and may provide a solution to this dilemma. In the present work we ask: does SERP provide a complete and unbiased sample of social media data? Is SERP a viable alternative to direct API-access? To answer these questions, we perform a comparative analysis between (Google) SERP results and nonsampled data from Reddit and Twitter/X. We find that SERP results are highly biased in favor of popular posts; against political, pornographic, and vulgar posts; are more positive in their sentiment; and have large topical gaps. Overall, we conclude that SERP is not a viable alternative to social media API access.
Provably Valid and Diverse Mutations of Real-World Media Data for DNN Testing
Yuan, Yuanyuan, Pang, Qi, Wang, Shuai
Deep neural networks (DNNs) often accept high-dimensional media data (e.g., photos, text, and audio) and understand their perceptual content (e.g., a cat). To test DNNs, diverse inputs are needed to trigger mis-predictions. Some preliminary works use byte-level mutations or domain-specific filters (e.g., foggy), whose enabled mutations may be limited and likely error-prone. SOTA works employ deep generative models to generate (infinite) inputs. Also, to keep the mutated inputs perceptually valid (e.g., a cat remains a "cat" after mutation), existing efforts rely on imprecise and less generalizable heuristics. This study revisits two key objectives in media input mutation - perception diversity (DIV) and validity (VAL) - in a rigorous manner based on manifold, a well-developed theory capturing perceptions of high-dimensional media data in a low-dimensional space. We show important results that DIV and VAL inextricably bound each other, and prove that SOTA generative model-based methods fundamentally fail to mutate real-world media data (either sacrificing DIV or VAL). In contrast, we discuss the feasibility of mutating real-world media data with provably high DIV and VAL based on manifold. We concretize the technical solution of mutating media data of various formats (images, audios, text) via a unified manner based on manifold. Specifically, when media data are projected into a low-dimensional manifold, the data can be mutated by walking on the manifold with certain directions and step sizes. When contrasted with the input data, the mutated data exhibit encouraging DIV in the perceptual traits (e.g., lying vs. standing dog) while retaining reasonably high VAL (i.e., a dog remains a dog). We implement our techniques in DEEPWALK for testing DNNs. DEEPWALK outperforms prior methods in testing comprehensiveness and can find more error-triggering inputs with higher quality.
Heterogeneous Social Event Detection via Hyperbolic Graph Representations
Qiu, Zitai, Wu, Jia, Yang, Jian, Su, Xing, Aggarwal, Charu C.
Social events reflect the dynamics of society and, here, natural disasters and emergencies receive significant attention. The timely detection of these events can provide organisations and individuals with valuable information to reduce or avoid losses. However, due to the complex heterogeneities of the content and structure of social media, existing models can only learn limited information; large amounts of semantic and structural information are ignored. In addition, due to high labour costs, it is rare for social media datasets to include high-quality labels, which also makes it challenging for models to learn information from social media. In this study, we propose two hyperbolic graph representation-based methods for detecting social events from heterogeneous social media environments. For cases where a dataset has labels, we designed a Hyperbolic Social Event Detection (HSED) model that converts complex social information into a unified social message graph. This model addresses the heterogeneity of social media, and, with this graph, the information in social media can be used to capture structural information based on the properties of hyperbolic space. For cases where the dataset is unlabelled, we designed an Unsupervised Hyperbolic Social Event Detection (UHSED). This model is based on the HSED model but includes graph contrastive learning to make it work in unlabelled scenarios. Extensive experiments demonstrate the superiority of the proposed approaches.
A Study of Slang Representation Methods
Kolla, Aravinda, Ilievski, Filip, Sandlin, Hông-Ân, Mermoud, Alain
Considering the large amount of content created online by the minute, slang-aware automatic tools are critically needed to promote social good, and assist policymakers and moderators in restricting the spread of offensive language, abuse, and hate speech. Despite the success of large language models and the spontaneous emergence of slang dictionaries, it is unclear how far their combination goes in terms of slang understanding for downstream social good tasks. In this paper, we provide a framework to study different combinations of representation learning models and knowledge resources for a variety of downstream tasks that rely on slang understanding. Our experiments show the superiority of models that have been pre-trained on social media data, while the impact of dictionaries is positive only for static word embeddings. Our error analysis identifies core challenges for slang representation learning, including out-of-vocabulary words, polysemy, variance, and annotation disagreements, which can be traced to characteristics of slang as a quickly evolving and highly subjective language.
How to leverage AI for social media sentiment analysis - ET CIO
In a world where a single tweet can make or break a brand, it is crucial for companies and brands to invest in social media automation and analysis to derive actionable insights on brand perception. You would not like to wait for 12 hours to reply to that negative comment while #quit prefixed with your brand name trends on Twitter and Instagram, would you? Studies have shown that customers tend to be more vocal and frank with their views on social media. How they perceive a particular brand, its products/services fundamentally influence their behavior. So, for brands, being able to dig deep into the comments, replies, conversations, etc from customers can help uncover an unbiased view of their customers' behavior and persona, helping them understand customer intent and sentiments better.
Decoding the Social Effects Of Media with Machine Learning
What if media were optimized to benefit people? This thought-provoking question is at the core of Harmony Labs' mission. A nonprofit organization headquartered in New York City, Harmony Labs strives to better understand the impact of media on society, and build communities and tools to reform and transform media systems. As Brian Wanieswki, Executive Director at Harmony Labs puts it: "The media systems that we have now, for better or worse, have become outrage machines and sorting machines that put people into groups of like minds. The business incentive structures of these systems are such that the more outrage there is, the more profit there is. Political events across the world in recent years have borne out what these media systems produce, and it's really pretty toxic, and pretty hard to get anything done within. There are all kinds of natural divisions between people, but these media systems tend to reinforce these divisions. So, the first question that we're asking is, What's the scope of this problem? And then, What can we do to solve it?"
Social media data reveals signal for public consumer perceptions
Pokhriyal, Neeti, Dara, Abenezer, Valentino, Benjamin, Vosoughi, Soroush
Researchers have used social media data to estimate various macroeconomic indicators about public behaviors, mostly as a way to reduce surveying costs. One of the most widely cited economic indicator is consumer confidence index (CCI). Numerous studies in the past have focused on using social media, especially Twitter data, to predict CCI. However, the strong correlations disappeared when those models were tested with newer data according to a recent comprehensive survey. In this work, we revisit this problem of assessing the true potential of using social media data to measure CCI, by proposing a robust non-parametric Bayesian modeling framework grounded in Gaussian Process Regression (which provides both an estimate and an uncertainty associated with it). Integral to our framework is a principled experimentation methodology that demonstrates how digital data can be employed to reduce the frequency of surveys, and thus periodic polling would be needed only to calibrate our model. Via extensive experimentation we show how the choice of different micro-decisions, such as the smoothing interval, various types of lags etc. have an important bearing on the results. By using decadal data (2008-2019) from Reddit, we show that both monthly and daily estimates of CCI can, indeed, be reliably estimated at least several months in advance, and that our model estimates are far superior to those generated by the existing methods.
Federal Election – AI Driven Insights Hill Knowlton Strategies – Canada
Hill Knowlton Strategies is joining forces with AI pioneers Advanced Symbolics Inc. to conduct in-depth analysis of 28 key federal ridings in #ELXN43. This series is called Ridings to Watch. Over the course of the election we will add blocks of seven strategically important ridings. Below you will find all 28 Ridings to Watch! The team at ASI created "Polly," an AI that predicts voter intentions based on publicly available social media data.